A Low Overhead Recovery Technique Using Quasi-Synchronous Checkpointing

نویسندگان

  • D. Manivannan
  • Mukesh Singhal
چکیده

In this paper, we propose a quasi-synchronous checkpointing algorithm and a low-overhead recovery algorithm based on it. The checkpointing algorithm preserves process autonomy by allowing them to take checkpoints asynchronously and uses communication-induced checkpoint coordination for the progression of the recovery line which helps bound rollback propagation during a recovery. Thus, it has the easeness and low overhead of asynchronous checkpointing and the recovery time advantages of synchronous checkpointing. There is no extra message overhead involved during checkpointing and the additional checkpointing overhead is nominal. The algorithm ensures the existence of a recovery line consistent with the latest checkpoint of any process all the time. The recovery algorithm exploits this feature to restore the system to a state consistent with the latest checkpoint of a failed process. The recovery algorithm has no domino effect and a failed process needs only to rollback to its latest checkpoint and request the other processes to roll back to a consistent checkpoint. To avoid domino effect, it uses selective pessimistic message logging at the receiver end. The recovery is asynchronous for single process failure. Neither the recovery algorithm nor the checkpointing algorithm requires the channels to be FIFO. We do not use vector timestamps for determining dependency between checkpoints since vector timestamps generally result in high message overhead during failure-free operation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comprehensive Low-overhead Process Recovery Based on Quasi-synchronous Checkpointing

In this paper, we propose a low-overhead recovery algorithm based on a quasi-synchronous checkpointing algorithm. The checkpointing algorithm preserves process autonomy by allowing them to take checkpoints asynchronously and uses communication-induced checkpoint coordination for the progression of the recovery line which helps bound rollback propagation during a recovery. Thus, it has the easen...

متن کامل

Failure Recovery based on Quasi-Synchronous Checkpointing in Mobile Computing Systems

Mobile computing systems are expected to revolutionize the way computers are used. Mobile hosts have small memory, a relatively slow processor and low power batteries, and communicate over low bandwidth wireless communication links. In this paper, we address the problem of failure recovery in mobile computing systems. Any recovery method for mobile computing systems should take into considerati...

متن کامل

Quasi-synchronous Checkpointing: Models, Characterization, and Classiication

Checkpointing algorithms are classiied as synchronous and asynchronous in the literature. In synchronous checkpointing, processes synchronize their checkpointing activities so that a globally consistent set of checkpoints is always maintained in the system. Synchronizing checkpointing activity involves message overhead and process execution may have to be suspended during the checkpointing coor...

متن کامل

Efficient Checkpoint-based Failure Recovery Techniques in Mobile Computing Systems

Conventional distributed and domino effect-free failure recovery techniques are inappropriate for mobile computing systems because each mobile host is forced to take a new checkpoint (based on coordinated checkpointing). Otherwise, multiple local checkpoints may need to be stored in stable storage (based on communication-induced checkpointing). Hence, this investigation presents a novel domino ...

متن کامل

A New Checkpointing Approach for Mobile Distributed System

In this paper, we introduce a weighted checkpointing approach for the mobile distributed computing system (MDCS) that significantly reduces checkpointing overheads on mobile nodes. Checkpoint protocols proposed so far in the literature for MDCS are either coordinated, log based or quasi-synchronous. Coordinated checkpointing requires extra synchronization messages and may block the underlying c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996